Model Selection

16kHz audio adaptation

# 16kHz audio adaptation

Viwav2vec2 Base 3k

This model is a Wav2Vec2 base model pre-trained on 3,000 hours of Vietnamese speech data, suitable for Vietnamese speech recognition tasks, and requires fine-tuning on downstream tasks for use.

Speech Recognition

Transformers Other

Data2vec Audio Large 100h

Data2Vec is a general self-supervised learning framework applicable to speech, natural language processing, and computer vision tasks. This model is a large-scale model pre-trained and fine-tuned on 100 hours of Librispeech audio data.

Speech Recognition

Transformers English

Wav2vec2 Xlsr Multilingual 53 Fa

A multilingual speech recognition model based on the wav2vec 2.0 architecture, specifically fine-tuned for Persian, significantly reducing word error rate

Speech Recognition

Wav2vec2 Large Xlrs Estonian

This is an automatic speech recognition (ASR) model fine-tuned on the Estonian Common Voice dataset, based on the facebook/wav2vec2-large-xlsr-53 model.

Speech Recognition Other

Wav2vec2 Base Hr Voxpopuli V2

Speech model based on Facebook's Wav2Vec2 architecture, pre-trained on the Croatian VoxPopuli corpus

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr 53 Breton

A Breton fine-tuned speech recognition model based on facebook/wav2vec2-large-xlsr-53

Speech Recognition Other

Wav2vec2 Large Xlsr 53 Hungarian

This is a Hungarian automatic speech recognition model fine-tuned from the facebook/wav2vec2-large-xlsr-53 model, trained using the Common Voice dataset.

Speech Recognition Other

W2v Hf Commonvoice From Xlsr53 Pretrain 0329UTC1500

A speech recognition model fine-tuned on the Common Voice Japanese dataset based on facebook/wav2vec2-large-xlsr-53

Speech Recognition

Wav2vec2 Large 960h Lv60

Wav2Vec2 is a powerful speech recognition model that extracts features from raw audio through self-supervised learning and achieves high-performance speech recognition with limited labeled data.

Speech Recognition English

Wav2vec2 Large Xlsr Georgian

Georgian automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr 53 Chuvash

A Chuvash automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on the Common Voice dataset with a word error rate of 40.01%.

Speech Recognition Other

Wav2vec2 Large Xlsr 53 German

This is a fine-tuned XLSR-53 large model for German speech recognition tasks, based on Facebook's wav2vec2-large-xlsr-53 model and fine-tuned on the Common Voice 6.1 German dataset.

Speech Recognition German

Wav2vec2 Base Vn 270h

A speech recognition model fine-tuned with approximately 270 hours of Vietnamese annotated data, supporting Vietnamese automatic speech recognition tasks

Speech Recognition Other

Wav2vec2 Large Superb Ks

A speech classification model fine-tuned on the SUPERB keyword spotting task, based on the Wav2Vec2-Large-LV60 pre-trained model

Speech Recognition

Transformers English

Wav2vec2 Large Xlsr 53 Estonian

An automatic speech recognition model fine-tuned for Estonian using the Common Voice dataset, based on facebook/wav2vec2-large-xlsr-53

Speech Recognition

Transformers Other

Wav2vec2 Base Da Voxpopuli V2

A speech model based on Facebook's Wav2Vec2 architecture, specifically pre-trained for Danish using 13.6k unlabeled data from the VoxPopuli corpus.

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr 53 Estonian

Estonian speech recognition model fine-tuned from Facebook's XLSR-53 large model, achieving 30.74% word error rate on Common Voice dataset

Speech Recognition Other

Wav2vec2 Xlsr 53 Tamil

A Tamil speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on the Common Voice Tamil dataset.

Speech Recognition Other

Wav2vec2 Large Xlsr 53 Spanish

A Spanish speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on the Common Voice 6.1 Spanish dataset

Speech Recognition Spanish

Wav2vec2 Large West Germanic Voxpopuli V2

Facebook's Wav2Vec2 large model, pretrained exclusively on 66.3 hours of unlabeled data from the West Germanic VoxPopuli corpus.

Speech Recognition

Wav2vec2 Large El Voxpopuli V2

Greek speech recognition model pretrained on VoxPopuli corpus using 17.7 hours of unlabeled data

Speech Recognition

Transformers Other

Sew D Tiny 100k

SEW-D is a compressed and efficient speech pre-training model developed by ASAPP Research, pre-trained on 16kHz sampled speech audio, suitable for various downstream speech tasks.

Speech Recognition

Transformers English

Wav2vec2 Large Xlsr 53 Mongolian

An automatic speech recognition model fine-tuned on the Common Voice Mongolian dataset based on facebook/wav2vec2-large-xlsr-53

Speech Recognition

Transformers Other

Wav2vec2 Large Fr Voxpopuli French

A French speech recognition model fine-tuned from facebook/wav2vec2-large-fr-voxpopuli, trained on the Common Voice 6.1 French dataset, supporting 16kHz audio input

Speech Recognition French

Wav2vec2 Large Xlsr 53 Sakha

Yakut speech recognition model fine-tuned from XLSR-53 large model, with 32.23% word error rate

Speech Recognition Other

Wav2vec2 Large Xlsr 53 Vietnamese

A Vietnamese automatic speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input.

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr Vietnamese

Vietnamese automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53

Speech Recognition Other

Wav2vec2 Large Xlsr 53 Lithuanian

A Lithuanian speech recognition model fine-tuned from Facebook's XLSR-53 large model, trained on the Common Voice dataset with a test WER of 56.55%.

Speech Recognition Other

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase